49 research outputs found

    2DT-GLS: Faster and exception-free scalar multiplication in the GLS254 binary curve

    Get PDF
    We revisit and improve performance of arithmetic in the binary GLS254 curve by introducing the 2DT-GLS scalar multiplication algorithm. The algorithm includes theoretical and practice-oriented contributions of potential independent interest: (i) for the first time, a proof that the GLS scalar multiplication algorithm does not incur exceptions, such that faster incomplete formulas can be used; (ii) faster dedicated atomic formulas that alleviate the cost of precomputation; (iii) a table compression technique that reduces the storage needed for precomputed points; (iv) a refined constant-time scalar decomposition algorithm that is more robust to rounding. We also present the first GLS254 implementation for Armv8. With our contributions, we set new speed records for constant-time scalar multiplication by 34.5%34.5\% and 6%6\% on 64-bit Arm and Intel platforms, respectively

    MOSFHET: Optimized Software for FHE over the Torus

    Get PDF
    Homomorphic encryption is one of the most secure solutions for processing sensitive information in untrusted environments, and there have been many recent advances towards its efficient implementation for the evaluation of linear functions and approximated arithmetic. However, the practical performance when evaluating arbitrary (nonlinear) functions is still a major challenge for HE schemes. The TFHE scheme [Chillotti et al., 2016] is the current state-of-the-art for the evaluation of arbitrary functions, and, in this work, we focus on improving its performance. We divide this paper into two parts. First, we review and implement the main techniques to improve performance or error behavior in TFHE proposed so far. For many, this is the first practical implementation. Then, we introduce novel improvements to several of them and new approaches to implement some commonly used procedures. We also show which proposals can be suitably combined to achieve better results. We provide a single library containing all the reviewed techniques as well as our original contributions. Our implementation is up to 1.2 times faster than previous ones with a similar optimization level, and our novel techniques provide speedups of up to 2.83 times on algorithms such as the Full-Domain Functional Bootstrap (FDFB)

    Verifiable Mix-Nets and Distributed Decryption for Voting from Lattice-Based Assumptions

    Get PDF
    Cryptographic voting protocols have recently seen much interest from practitioners due to their (planned) use in countries such as Estonia, Switzerland, France, and Australia. Practical protocols usually rely on tested designs such as the mixing-and-decryption paradigm. There, multiple servers verifiably shuffle encrypted ballots, which are then decrypted in a distributed manner. While several efficient protocols implementing this paradigm exist from discrete log-type assumptions, the situation is less clear for post-quantum alternatives such as lattices. This is because the design ideas of the discrete log-based voting protocols do not carry over easily to the lattice setting, due to specific problems such as noise growth and approximate relations. In this work, we propose a new verifiable secret shuffle for BGV ciphertexts and a compatible verifiable distributed decryption protocol. The shuffle is based on an extension of a shuffle of commitments to known values which is combined with an amortized proof of correct re-randomization. The verifiable distributed decryption protocol uses noise drowning, proving the correctness of decryption steps in zero-knowledge. Both primitives are then used to instantiate the mixing-and-decryption electronic voting paradigm from lattice-based assumptions. We give concrete parameters for our system, estimate the size of each component and provide implementations of all important sub-protocols. Our experiments show that the shuffle and decryption protocol is suitable for use in real-world e-voting schemes

    Privacy-preserving edit distance computation using secret-sharing two-party computation

    Get PDF
    The edit distance is a metric widely used in genomics to measure the similarity of two DNA chains. Motivated by privacy concerns, we propose a 2PC protocol to compute the edit distance while preserving the privacy of the inputs. Since the edit distance algorithm can be expressed as a mixed-circuit computation, our approach uses protocols based on secret-sharing schemes like Tinier and SPDZ2k\mathbb{Z}_{2^k}; and also daBits to perform domain conversion and edaBits to perform arithmetic comparisons. We modify the Wagner-Fischer edit distance algorithm, aiming at reducing the number of rounds of the protocol, and achieve a flexible protocol with a trade-off between rounds and multiplications. We implement our proposal in the MP-SPDZ framework, and our experiments show that it reduces the execution time respectively by 81\% and 54\% for passive and active security with respect to a baseline implementation in a LAN. The experiments also show that our protocol reduces traffic by two orders of magnitude compared to a BMR-MASCOT implementation

    Two is the fastest prime: lambda coordinates for binary elliptic curves

    Get PDF
    In this work, we present new arithmetic formulas for a projective version of the affine point representation (x,x+y/x),(x,x+y/x), for x≠0,x\ne 0, which leads to an efficient computation of the scalar multiplication operation over binary elliptic curves.A software implementation of our formulas applied to a binary Galbraith-Lin-Scott elliptic curve defined over the field F2254\mathbb{F}_{2^{254}} allows us to achieve speed records for protected/unprotected single/multi-core random-point elliptic curve scalar multiplication at the 127-bit security level. When executed on a Sandy Bridge 3.4GHz Intel Xeon processor, our software is able to compute a single/multi-core unprotected scalar multiplication in 69,50069,500 and 47,90047,900 clock cycles, respectively; and a protected single-core scalar multiplication in 114,800114,800 cycles. These numbers are improved by around 2\% and 46\% on the newer Ivy Bridge and Haswell platforms, respectively, achieving in the latter a protected random-point scalar multiplication in 60,000 clock cycles

    Faster constant-time evaluation of the Kronecker symbol with application to elliptic curve hashing

    Get PDF
    We generalize the Bernstein-Yang (BY) algorithm for constant-time modular inversion to compute the Kronecker symbol, of which the Jacobi and Legendre symbols are special cases. We start by developing a basic and easy-to-implement divstep version of the algorithm defined in terms of full-precision division steps. We then describe an optimized version due to Hamburg over word-sized inputs, similar to the jumpdivstep version of the BY algorithm, and formally verify its correctness. Along the way, we introduce a number of optimizations for implementing both versions in constant time and at high-speed. The resulting algorithms are particularly suitable for the special case of computing the Legendre symbol with dense prime pp, where no efficient addition chain is known for the conventional approach by exponentiation to p−12\frac{p-1}{2}. This is often the case for the base field of popular pairing-friendly elliptic curves. Our high-speed implementation for a range of parameters shows that the new algorithm is up to 40 times faster than the conventional exponentiation approach, and up to 25.7\% faster than the previous state of the art. We illustrate the performance of the algorithm with an application for hashing to elliptic curves, where the observed savings amount to 14.7\% -- 48.1\% when used for testing quadratic residuosity within the SwiftEC hashing algorithm. We also apply our techniques to the CTIDH isogeny-based key exchange, with savings of 3.5--13.5\%

    Faster Homomorphic Encryption over GPGPUs via hierarchical DGT

    Get PDF
    Privacy guarantees are still insufficient for outsourced data processing in the cloud. While employing encryption is feasible for data at rest or in transit, it is not for computation without remarkable performance slowdown. Thus, handling data in plaintext during processing is still required, which creates vulnerabilities that can be exploited by malicious entities. Homomorphic encryption (HE) schemes are natural candidates for computation in the cloud since they enable processing of ciphertexts without any knowledge about the related plaintexts or the decryption key. This work focuses on the challenge of developing an efficient implementation of the BFV HE scheme on CUDA. This is done by combining and adapting different approaches from the literature, namely the double-CRT representation and the Discrete Galois Transform. Moreover, we propose and implement an improved formulation of the DGT inspired by classical algorithms, which computes the transform up to 2.62.6 times faster than the state-of-the-art. By using these approaches, we obtain up to 3.63.6 times faster homomorphic multiplication

    Performance of Hierarchical Transforms in Homomorphic Encryption: A case study on Logistic Regression inference

    Get PDF
    Recent works challenged the Number-Theoretic Transform (NTT) as the most efficient method for polynomial multiplication in GPU implementations of Fully Homomorphic Encryption schemes such as CKKS and BFV. In particular, these works argue that the Discrete Galois Transform (DGT) is a better candidate for this particular case. However, these claims were never rigorously validated, and only intuition was used to argue in favor of each transform. This work brings some light on the dis- cussion by developing similar CUDA implementations of the CKKS cryptosystem, differing only in the underlying transform and related data structure. We ran several experiments and collected perfor- mance metrics in different contexts, ranging from the basic direct comparison between the transforms to measuring the impact of each one on the inference phase of the logistic regression algorithm. Our observations suggest that, despite some specific polynomial ring configurations, the DGT in a stan- dalone implementation does not offer the same performance as the NTT. However, when we consider the entire cryptosystem, we noticed that the effects of the higher arithmetic density of the DGT on other parts of the implementation is substantial, implying a considerable performance improvement of up to 15% on the homomorphic multiplication. Furthermore, this speedup is consistent when we consider a more complex application, indicating that the DGT suits better the target architecture

    Implementing Pairings at the 192-bit Security Level

    Get PDF
    We implement asymmetric pairings derived from Kachisa-Schaefer-Scott (KSS), Barreto-Naehrig (BN), and Barreto-Lynn-Scott (BLS) elliptic curves at the 192-bit security level. Somewhat surprisingly, we find pairings derived from BLS curves with embedding degree 12 to be the fastest for our serial as well as our parallel implementations. Our serial implementations provide a factor-3 speedup over the previous state-of-the-art, demonstrating that pairing computation at the 192-bit security level is not as expensive as previously thought. We also present a general framework for deriving a Weil-type pairing that is well-suited for computing a single pairing on a multi-processor machine
    corecore